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Abstract — Low-complexity near-optimal detection of large-MI- 
MO signals has attracted recent research. Recently, we pro- 
posed a local neighborhood search algorithm, namely reactive 
tabu search (RTS) algorithm, as well as a factor-graph based be- 
lief propagation (BP) algorithm for low-complexity large-MIMO 
detection. The motivation for the present work arises from the 
following two observations on the above two algorithms: i) RTS 
works for general A/-QAM. Although RTS was shown to achieve 
close to optimal performance for 4-QAM in large dimensions, 
significant performance improvement was still possible for high- 
er-order QAM (e.g., 16- and 64-QAM). ii) BP also was shown 
to achieve near-optimal performance for large dimensions, but 
only for {±1} alphabet. In this paper, we improve the large- 
MIMO detection performance of higher-order QAM signals by 
using a hybrid algorithm that employs RTS and BP. In partic- 
ular, motivated by the observation that when a detection error 
occurs at the RTS output, the least significant bits (LSB) of the 
symbols are mostly in error, we propose to first reconstruct and 
cancel the interference due to bits other than LSBs at the RTS 
output and feed the interference cancelled received signal to the 
BP algorithm to improve the reliability of the LSBs. The output 
of the BP is then fed back to RTS for the next iteration. Our 
simulation results show that in a 32 x 32 V-BLAST system, the 
proposed RTS-BP algorithm performs better than RTS by about 
3.5 dB at 10"^ uncoded BER and by about 2.5 dB at 3 x 10^ 
rate-3/4 turbo coded BER with 64-QAM at the same order of 
complexity as RTS. We also illustrate the performance of large- 
MIMO detection in frequency-selective fading channels. 

Keywords — Large-MIMO signal detection, reactive tabu search, belief 
propagation, higher-order QAM. 

I. Introduction 
Multiple-input multiple-output (MIMO) systems with large 
number (e.g., tens) of transmit and receive antennas, referred 
to as iarge-MIMO systems,' are of interest because of the 
high capacities/spectral efficiencies theoretically predicted in 
these systems II],!!). Research in low-complexity receive 
processing (e.g., MIMO detection) techniques that can lead to 
practical realization of large-MIMO systems is both nascent 
as well as promising. For e.g., NTT DoCoMo has already 
field demonstrated a 12 x 12 V-BLAST system operating at 
5 Gbps data rate and 50 bps/Hz spectral efficiency in 5 GHz 
band at a mobile speed of 10 Km/hr f3l. Evolution of WiFi 
standards (evolution from IEEE 802. lln to IEEE 802.11ac 
to achieve multi-gigabit rate transmissions in 5 GHz band) 
now considers 16 x 16 MIMO operation; see 16 x 16 MIMO 
indoor channel sounding measurements at 5 GHz reported 
in ||4| for consideration in WiFi standards. Also, 64 x 64 
MIMO channel sounding measurements at 5 GHz in indoor 
environments have been reported in |5j . We note that, while 
RF/antenna technologies/measurements for large-MIMO sys- 
tems are getting matured, there is an increasing need to focus 
on low-complexity algorithms for detection in large-MIMO 
systems to reap their high spectral efficiency benefits. 



In the above context, in our recent works, we have shown 
that certain algorithms from machine learning/artificial in- 
telligence achieve near-optimal performance in large-MIMO 
systems at low complexities 0-|[T2]Q. In ISj-lHl, a local 
neighborhood search based algorithm, namely, a likelihood 
ascent search (LAS) algorithm, was proposed and shown to 
achieve close to maximum-likelihood (ML) performance in 
MIMO systems with several tens of antennas (e.g., 32 x 32 
and 64 x 64 MIMO). Subsequently, in ||9l,|[l0l, another local 
search algorithm, namely, reactive tabu search (RTS) algo- 
rithm, which performed better than the LAS algorithm through 
the use of a local minima exit strategy was presentee^. In 
ifm . near-ML performance in a 50 x 50 MIMO system was 
demonstrated using a Gibbs sampling based detection algo- 
rithm, where the symbols take values from {±1}. More re- 
cently, we, in 112], proposed a factor graph based belief prop- 
agation (BP) algorithm for large-MIMO detection, where we 
adopted a Gaussian approximation of the interference (GAI). 

The motivation for the present work arises from the following 
two observations on the RTS and BP algorithms in ||9l, lfT0l 
and ini: i) RTS works for general 7\/-QAM. Although RTS 
was shown to achieve close to ML performance for 4-QAM in 
large dimensions, significant performance improvement was 
still possible for higher-order QAM (e.g., 16- and 64-QAM). 
ii) BP also was shown to achieve near-optimal performance 
for large dimensions, but only for {±1} alphabet. In this pa- 
per, we improve the large-MIMO detection performance of 
higher-order QAM signals by using a hybrid algorithm that 
employs RTS and BP. In particular, we observed that when a 
detection error occurs at the RTS output, the least significant 
bits (LSB) of the symbols are mostly in error Motivated by 
this observation, we propose to first reconstruct and cancel 
the interference due to bits other than the LSBs at the RTS 
output and feed the interference cancelled received signal to 
the BP algorithm to improve the reliability of the LSBs. The 
output of the BP is then fed back to the RTS for the next iter- 
ation. Our simulation results show that the proposed RTS-BP 
algorithm achieves better uncoded as well as coded BER per- 
formance compared to those achieved by RTS in large-MIMO 
systems with higher-order QAM (e.g., RTS-BP performs bet- 
ter by about 3.5 dB at 10^'^ uncoded BER and by about 2.5 
dB at 3 X 10-4 rate-3/4 turbo coded BER in 32 x 32 V-BLAST 
with 64-QAM) at the same order of complexity as RTS. 

The rest of this paper is organized as follows. In Sec. [Ill 

^Similai' algorithms have been reported eai'her in the context of multiuser 
detection in large CDMA systems. 

^In fSl. llOI . we compared the performance and complexities of LAS and 
RTS algorithms with those of the sphere decoding (SD) variants in |13| and 
1141 . and showed that these SD variants do not scale well for the large di- 
mensions considered. 



we introduce the RTS and BP algorithms in |l9l,|[l0l and Ull 
and the motivation for the current work. The proposed hybrid 
RTS-BP algorithm and its performance are presented in Sees. 
|III]and|IVl Conclusions are given in Sec. |V] 

II. RTS AND BP Algorithms for Large-MIMO 
Detection 

Consider aNfXNr V-BLAST MIMO system whose received 
signal vector, e C^"^, is of the form 

Yc = HcXc + ric, (1) 
where Xc G C^' is the symbol vector transmitted. He G 
(£N,.xNt jg jjjg channel gain matrix, and ric G C^'' is the 
noise vector whose entries are modeled as i.i.d CJ\f{0,(j'^). 
Assuming rich scattering, we model the entries of He as i.i.d 
CM{0, 1). Each element of is an M-PAM or M-QAM 
symbol. il/-PAM symbols take values from {Am, rn = 1, 2, 
• • ■ , A/}, where Am = (2m - 1 - M), and M-QAM is 
nothing but two PAMs in quadrature. As in Q, we convert 
([T]i into a real-valued system model, given by 

y = Hx + n, (2) 
where H G W^^^-^^^^, y g M.^^", x G K^^*, n G K^a^". 
For A/-QAM, [x±, - ■ ■ , satJ can viewed to be from an under- 
lying M-PAM signal set, and so is [xMt+ir ' ' TX2Nt]- Let 
Ai denote the Af -PAM signal set from which xi takes values, 
i = 1, 2, • • ■ , 2Nt. Defining a 2 A^t -dimensional signal space 
§ to be the Cartesian product of Ai to A27Vt , the ML solution 
vector, xjv/L, is given by 

arg min 

X G § ' 



y^ML = 



Hxll 



(3) 



whose complexity is exponential in Nf. The RTS algorithm 
in ll9l. lfT0l is a low-complexity algorithm, which minimizes 
the ML metric in ^ through a local neighborhood search. 

A. RTS Algorithm 

A detailed description of the RTS algorithm for large-MIMO 
detection is available in ll9l. lfT0l . Here, we present a brief 
summary of the key aspects of the algorithm, and its 16- and 
64-QAM performance that motivates the current work. 

The RTS algorithm starts with an initial solution vector, de- 
fines a neighborhood around it (i.e., defines a set of neigh- 
boring vectors based on a neighborhood criteria), and moves 
to the best vector among the neighboring vectors (even if the 
best neighboring vector is worse, in terms of likelihood, than 
the current solution vector; this allows the algorithm to escape 
from local minima). This process is continued for a certain 
number of iterations, after which the algorithm is terminated 
and the best among the solution vectors in all the iterations 
is declared as the final solution vector. In defining the neigh- 
borhood of the solution vector in a given iteration, the algo- 
rithm attempts to avoid cycling by making the moves to solu- 
tion vectors of the past few iterations as 'tabu' (i.e., prohibits 
these moves), which ensures efficient search of the solution 
space. The number of these past iterations is parametrized as 
the 'tabu period.' The search is referred to as fixed tabu search 
if the tabu period is kept constant. If the tabu period is dynam- 
ically changed (e.g., increase the tabu period if more repeti- 
tions of the solution vectors are observed in the search path). 
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UNCODED BER performance of RTS ALGORITHM IN 32 X 32 
V-BLAST FOR 4-, 16-, 64-QAM. Performance improvement is possible 
in 16-, 64-QAM. 



then the search is called reactive tabu search. We consider re- 
active tabu search because of its robustness (choice of a good 
fixed tabu period can be tedious). The per-symbol complexity 
of RTS for detection of V-BLAST signals is 0{NtNr). 

1) Motivation of Current Work: Figure[T]shows the uncoded 
BER performance of RTS using the algorithm parameters op- 
timized through simulations for 4-, 16-, and 64-QAM in a 
32 X 32 V-BLAST system. As lower bounds on the error 
performance in MIMO, the SISO AWGN performance for 
4-, 16-, and 64-QAM are also plotted. It can be seen that, 
in the case of 4-QAM, the RTS performance is just about 
0.5 dB away from the SISO AWGN performance at 10~^ 
BER. However, the gap between RTS performance and SISO 
AWGN performance at 10^^ BER widens for 16-QAM and 
64-QAM; the gap is 7.5 dB for 16-QAM and 16.5 dB for 
64-QAM. This gap can be viewed as a potential indicator of 
the amount of improvement in performance possible further. 
A more appropriate indicator will be the gap between RTS 
performance and the ML performance. Since simulation of 
sphere decoding (SD) of 32 x 32 V-BLAST with 16- and 64- 
QAM (64 real dimensions) is computationally intensive, we 
do not show the SD (ML) performance. Nevertheless, the 
widening gap of RTS Performance from SISO AWGN per- 
formance for 16- and 64-QAM seen in Fig. [T]motivated us to 
explore improved algorithms to achieve better performance 
than RTS performance for higher-order QAM. 

B. BP Algorithm Based on GAI 

In |fT2| . we presented a detection algorithm based on BP on 
factor graphs of MIMO systems. In (|2]), each entry of the 
vector y is treated as a function node (observation node), and 
each symbol, G {±1}, as a variable node. A key ingredient 
in the BP algorithm in lfT2l . which contributes to its low com- 
plexity, is the Gaussian approximation of interference (GAI), 
where the interference plus noise term, Zik, in 
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Fig. 2 

Message PASSING BETWEEN variable nodes and observation 

NODES. 



Hi = hikXk+ ^ hijXj + rii, 



(4) 



III. Proposed Hybrid RTS-BP Algorithm for 
Large-MIMO Detection 

In this section, we highlight the rationale behind the hybrid 
RTS-BP approach and present the proposed algorithm. 

Why Hybrid RTS-BP? 

The proposed hybrid RTS-BP approach is motivated by the 
the following observation we made in our RTS BER simula- 
tions. We observed that, at moderate to high SNRs, when an 
RTS output vector is in error, the least significant bits (LSB) 
of the data symbols are more likely to be in error than other 
bits. An analytical reasoning for this behavior can be given 
as follows. 

Let X be the transmit vector and x be the corresponding out- 
put of the RTS detector Let A = {ai, 02, • • • , um} denote 
the 7\/-PAM alphabet that Xi's take values from. Consider the 
symbol-to-bit mapping, where we can write the value of each 
entry of x as a linear combination of its constituent bits as 



,2Nt. 



(8) 



is modeled as CM^fiz-f. , <jI.^ ) with 



^i.j^k \hij\'^^^i{xj) + ^, where % is the 
(i, j)th element in H. With .t^'s e {±1}, the log-likelihood 
ratio (LLR) of at observation node i, denoted by h.^, is 



and 4, = Er 



log 



p{y^\H.,Xk = 1) 
p(yi|H,Xfe = -1) 



^3— K {h*^{yi 



4. 



(5) 



The LLR values computed at the observation nodes are passed 
to the variable nodes (as shown in Fig. Using these LLRs, 
the variable nodes compute the probabilities 



k+ 



A 



-i|y) 



exp(Ez^.Af) 



l + exp(Ez^,Af)' 



(6) 



and pass them back to the observation nodes (Fig. |2]). This 
message passing is carried out for a certain number of itera- 
tions. At the end, Xk is detected as 



Xk 



sgn 



2N,. 

E 



A 



(7) 



It has been shown in |fT2l that this BP algorithm with GAI, 
like LAS and RTS algorithms, exhibits iarge-system behav- 
ior,' where the bit error performance improves with increas- 
ing number of dimensions. In Fig. [T] the uncoded BER per- 
formance of this BP algorithm for 4-QAM (input data vector 
of size 2Nt with elements from {±1}) in 32 x 32 V-BLAST 
is also plotted. We can see that the performance is almost the 
same as that of RTS. In terms of complexity, the BP algorithm 
has the advantage of no need to compute an initial solution 
vector and H^H, which is required in RTS. The per-symbol 
complexity of the BP algorithm for detection in V-BLAST is 
0{Nt). A limitation with this BP approach is that it is not for 
general A/-QAM. However, its good performance with {±1} 
alphabet at lower complexities than RTS can be exploited to 
improve the higher-order QAM performance of RTS, as pro- 
posed in the following section. 



where N = log^ M and b'f^ G {±1}. We note that the RTS 
algorithm outputs a local minima as the solution vector. So, 
x, being a local minima, satisfies the following conditions: 



^-H(X + A,eO||', Vi = l,---27Vt, (9) 
Xi),q ~ 1, • ■ • , M, and d denotes the ith 



l|y-Hxr < 

where Xi ~ {uq 

column of the identity matrix. Defining F = H-'^H, r = Hx, 
and denoting the ith column of H as h^, the conditions in ^ 
reduce to 

(10) 



where /y denotes the (i, j)th element of F. Under moder- 
ate to high SNR conditions, ignoring the noise, ( fTOb can be 
further reduced to 



2(x - x)^fjSgn(Ai) < Aj/ijSgn(Ai 



(11) 



where denotes the ith column of F. For Rayleigh fading, fa 
is chi-square distributed with 2Nt degrees of freedom with 
mean Nt- Approximating the distribution of fij to be normal 
with mean zero and variance for i 7^ J by central limit 
theorem, we can drop the sgn(Ai) in ( fTTT i. Using the fact that 
the minimum value of | A^ | is 2, (fTTI) can be simplified as 



E ^^f^: 



^ fa-! 



(12) 



where = xj — xj. Also, if Xi = Xj, by the normal ap- 
proximation in the above 

Now, the LHS in ( fT2] i being normal with variance propor- 
tional to and the RHS being positive, it can be seen that 
Ai, Vi take smaller values with higher probability. Hence, the 
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Proposed hybrid RTS-BP algorithm. 



symbols of x are nearest Euclidean neighbors of their corre- 
sponding symbols of the global minima with high probabil- 
itjU. Now, because of the symbol-to-bit mapping in ([8]), Xi 
will differ from its nearest Euclidean neighbors certainly in 
the LSB position, and may or may not differ in other bit po- 
sitions. Consequently, the LSBs of the symbols in the RTS 
output X are least reliable. 

The above observation then led us to consider improving the 
reliability of the LSBs of the RTS output using the BP algo- 
rithm in lfT2l . and iterate between RTS and BP as follows. 

Proposed Hybrid RTS-BP Algorithm: 

Figure [3] shows the block schematic of the proposed hybrid 
RTS-BP algorithm. The following four steps constitute the 
proposed algorithm. 

« Step 1: Obtain x using the RTS algorithm. Obtain the 
output bits b\^\ i = 1, • • • , 2Nu j = 0, ■ • ■ , - 1, 
from X and ([8]). 

« Step 2: Using the fep^'s from Step 1, reconstruct the in- 
terference from all bits other than the LSBs (i.e., inter- 



ference from all bits other than 6[°^'s) as 



I = ^2^Hb 



(14) 



where b(j) = {b'f\b'2^\ 
structed interference in ( fT4b from y as 



, , 62wJ ^- Cancel the recon- 



(15) 



Step 3: Run the BP-GAI algorithm in Sec. HLBI on 
the vector y in Step 2, and obtain an estimate of the 

LSBs. Denote this LSB output vector from BP as b 

Now, using b from the BP output, and the b'^^^ j = 
1, • ■ • , — 1 from the RTS output in Step 1, reconstruct 
the symbol vector as 



;{0) 



(16) 



« Step 4: Repeat Steps 1 to 3 using x as the initial vector 
to the RTS algorithm. 

•^Because Xi's and Xi's take values from M-FAM alphabet, Xi is said to 
be the Euclidean nearest neighbor of Xi if \xi — Xi\ = 2. 
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Uncoded BER comparison between the proposed hybrid 
RTS-BP AND the RTS FOR 16- AND 64-QAM IN 32 X 32 V-BLAST. 
RTS-BP performs 3.6 dB better than RTS at W'^ BER for 64-QAM. 



The algorithm is stopped after a certain number of iterations 
between RTS and BP. Our simulations showed that two itera- 
tions between RTS and BP are adequate to achieve good im- 
provement; more than two iterations resulted in only marginal 
improvement for the system parameters considered in the sim- 
ulations. Since the complexity of BP part of RTS-BP is less 
than that of the RTS part, the order of complexity of RTS-BP 
is same as that of RTS. 

IV. BER Performance of the Hybrid RTS-BP 
Detector 

In this section, we present the uncoded and coded BER per- 
formance of the proposed RTS-BP algorithm evaluated through 
simulations. Perfect knowledge of H is assumed at the re- 
ceiver. 

Performance in large V-BLAST Systems: Figure |4] shows the 
uncoded BER performance of 32 x 32 V-BLAST with 16- and 
64-QAM. Performance of both RTS-BP as well as RTS are 
shown. It can be seen that, at an uncoded BER of 10"^, RTS- 
BP performs better than RTS by about 3.6 dB for 64-QAM 
and by about 1.6 dB for 16-QAM. This illustrates the effec- 
tiveness of the proposed hybrid RTS-BP approach. Also, this 
improvement in uncoded BER is found to result in improved 
coded BER as well, as illustrated in Fig. |5] In Fig. |5] we have 
plotted the turbo coded BER of RTS-BP and RTS in 32 x 32 
V-BLAST with 64-QAM for rate- 1/2 (96 bps/Hz) and rate- 
3/4 (144 bps/Hz) turbo codes. It can be seen that, at a coded 
BER of 3 X 10-"*, RTS-BP performs better than RTS by about 
1.5 dB at 96 bps/Hz and by about 2.5 dB at 144 bps/Hz. 

Performance in large non- orthogonal STBC MIMO systems: 
We also evaluated the BER performance of large non-orthogo- 
nal STBC MIMO systems with higher-order QAM using RTS- 
BP detection. Figure |6] shows the uncoded BER of 8 x 8 
and 16 x 16 non-orthogonal STBC from cychc division alge- 
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Coded BER performance comparison between the proposed 
HYBRID RTS-BP AND THE RTS IN 32 X 32 V-BLAST WITH 64-QAM, 
i) RATE-1/2 TURBO CODE (96 BPS/HZ), ii) RATE-3/4 TURBO CODE (144 
BPS/Hz). RTS-BP performs better by about 1.5 dB and 2.5 dB, respectively, 
at these spectral efficiencies, af 3 X 10~* coded BER. 



bra for 16-QAM. Here again, we can see that RTS-BI 
achieves better performance than RTS. 

Performance in frequency-selective large V-BLAST systems. 
We note that the performance plots in Figs. |4] to |6] are fo 
frequency-flat fading, which could be the fading scenario ii 
MIMO-OFDM systems where a frequency-selective fading 
channel is converted to frequency-flat channels on multiple 
subcarriers. RTS-BP, RTS, and LAS algorithms, being suitec 
to work well in large dimensions, can be applied to equalize 
signals in frequency-selective channels in large-MIMO sys 
tems. Following the equivalent real-valued system model o 
the form in (|2|l for frequency-selective MIMO systems devel 
oped in lfT6l . we evaluated the performance of RTS-BP, RTS 
and LAS algorithms in 16 x 16 V-BLAST with 16-QAM oi 
a frequency selective channel with L = 6 equal energy mul- 
tipath components and K = symbols per frame. Figure 
|7] shows the superior performance of the RTS-BP algorithm 
over the RTS and LAS algorithms in this frequency-selective 
16 X 16 large-MIMO system with 16-QAM. 
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UNCODED BER PERFORMANCE COMPARISON BETWEEN THE HYBRID 
RTS-BP AND THE RTS FOR 16-QAM IN 8 X 8 AND 16 X 16 
NON-ORTHOGONAL STBCS FROM CDA IN [15]. 
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UNCODED BER PERFORMANCE COMPARISON BETWEEN THE HYBRID 
RTS-BP, RTS, AND LAS ALGORITHMS IN 16 X 16 V-BLAST WITH 
16-QAM IN FREQUENCY-SELECTIVE FADING WITH L = 6, iC = 64, 
AND UNIFORM POWER-DELAY PROFILE. 



V. Conclusions 

We proposed a hybrid algorithm that exploited the good fea- 
tures of the RTS and BP algorithms to achieve improved bit 
error performance and nearness to capacity performance for 
7\/-QAM signals in large-MIMO systems at practically af- 
fordable low complexities. We illustrated the performance 
gains of the proposed hybrid approach over the RTS algo- 
rithm in flat-fading as well as frequency-selective fading for 
large V-BLAST as well as large non-orthogonal STBC MIMO 
systems. We note (e.g., from the performance plots for 64- 
QAM in Figs. [T]and|5]l that further improvement in perfor- 
mance beyond what is achieved by the proposed hybrid RTS- 
BP algorithm could be possible. Investigation of alternate 



detection strategies to achieve this possible improvement is a 
subject for further investigation. 
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